Crime Hotspot Analysis

Let’s import all the necessary libraries.

Code
# Basics
import pandas as pd
import geopandas as gpd
from sklearn.cluster import dbscan
from shapely.geometry import Point, MultiPolygon, Polygon
import time

# Data Visualization-related Libraries

## matplotlib
import matplotlib

## folium
import folium
import panel as pn

pn.extension()

Load in the pre-processed “LAcrime” data frame and the shapefile for all LAPD reporting districts.

Code
LAcrime = pd.read_parquet("LAcrime_trimmed")
LAPD_rd = gpd.read_file("./LAPD_Reporting_Districts_7103070480637650964/LAPD_Reporting_Districts.shp")

Before performing the cluster analysis with DBSCAN, we first need to transform “LAcrime” into a GeoPandas data frame with a suitable geometry.

geometry = [Point(lon, lat) for lon, lat in zip(LAcrime['LON'], LAcrime['LAT'])]
LAcrime = gpd.GeoDataFrame(LAcrime, geometry=geometry)
LAcrime = LAcrime.set_crs(epsg = 4326, inplace = True)
LAcrime = LAcrime.to_crs(epsg = LAPD_rd.crs.to_epsg())
LAcrime["x"] = LAcrime.geometry.x
LAcrime["y"] = LAcrime.geometry.y

LAcrime.head()
DATE OCC AREA Crm Cd Crm Cd Desc LAT LON Date Year Month Day Crime_Description Crime Type geometry x y
4 01/02/2010 12:00:00 AM 1 122 RAPE, ATTEMPTED 34.0387 -118.2488 2010-01-02 2010 1 Saturday Rape violent POINT (-13163396.203 4033999.675) -1.316340e+07 4.034000e+06
5 01/04/2010 12:00:00 AM 1 442 SHOPLIFTING - PETTY THEFT ($950 & UNDER) 34.0480 -118.2577 2010-01-04 2010 1 Monday Other Theft property POINT (-13164386.946 4035249.076) -1.316439e+07 4.035249e+06
6 01/07/2010 12:00:00 AM 1 330 BURGLARY FROM VEHICLE 34.0389 -118.2643 2010-01-07 2010 1 Thursday Theft from Vehicle property POINT (-13165121.655 4034026.542) -1.316512e+07 4.034027e+06
7 01/08/2010 12:00:00 AM 1 230 ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT 34.0435 -118.2427 2010-01-08 2010 1 Friday Aggravated Assaults violent POINT (-13162717.154 4034644.510) -1.316272e+07 4.034645e+06
8 01/09/2010 12:00:00 AM 1 230 ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT 34.0450 -118.2640 2010-01-09 2010 1 Saturday Aggravated Assaults violent POINT (-13165088.259 4034846.028) -1.316509e+07 4.034846e+06

Now, setting a 1,000 m threshold and a minimum sample of 5, we perform the clustering with DBSCAN for each time period. This analysis aims at exploring the evolution of crime hotspots over time.

Code
eps = 1000  # in meters
min_samples = 5

start_time = time.time()

for year in range((LAcrime["Year"].unique())[0], (LAcrime["Year"].unique())[-1] + 1):
    for month in range((LAcrime["Month"].unique())[0], (LAcrime["Month"].unique())[-1] + 1):
        temp = LAcrime[(LAcrime["Month"] == month) & (LAcrime["Year"] == year)]
        cores1, labels1 = dbscan(temp[temp["Crime Type"] == "violent"][["x", "y"]],
                                 eps = eps,
                                 min_samples = min_samples)
        cores2, labels2 = dbscan(temp[temp["Crime Type"] == "property"][["x", "y"]],
                                 eps = eps,
                                 min_samples = min_samples)
        LAcrime.loc[(LAcrime["Crime Type"] == "violent") & (LAcrime["Year"] == year) & (LAcrime["Month"] == month), "label"] = labels1
        LAcrime.loc[(LAcrime["Crime Type"] == "property") & (LAcrime["Year"] == year) & (LAcrime["Month"] == month), "label"] = labels2

elapsed_time = time.time() - start_time

print(f"Clustering completed in {elapsed_time:.2f} seconds.")
Clustering completed in 42.07 seconds.

On top of showing the centroids of each cluster, we attempt to plot the convex hulls of each cluster as well.

Code
# Group by cluster label and generate convex hulls (or other boundaries) for each cluster.
def create_cluster_geometry(cluster_points):
    return cluster_points.unary_union.convex_hull

# Apply the geometry generation
clustered_geometries = LAcrime[LAcrime["label"] != -1].groupby(["Year", "Month", "Crime Type", "label"]).apply(lambda x: create_cluster_geometry(x.geometry))

# Convert the resulting geometries into a MultiPolygon if there are multiple disjoint areas.
multipolygons = []
crime_labels = []
Year = []
Month = []

for (year, month, crime_type, label), geom in clustered_geometries.items():
    if isinstance(geom, Polygon):
        multipolygons.append(geom)
        crime_labels.append((crime_type, label))
        Year.append(year)
        Month.append(month)
    elif isinstance(geom, MultiPolygon):
        multipolygons.extend(list(geom))
        crime_labels.extend([(crime_type, label)] * len(list(geom)))
        Year.append(year)
        Month.append(month)

# Create a new GeoDataFrame to store the clusters' multipolygon geometries.
LAcrime_clustered = gpd.GeoDataFrame({"geometry": multipolygons, "Crime Type Label": crime_labels, "Year": Year, "Month": Month}, crs = LAcrime.crs)
LAcrime_clustered = LAcrime_clustered.to_crs(epsg = 4326)

Set up the features for the base map consisting of LAPD reporting districts.

Code
# Generate 21 unique colors using a colormap.
num_precs = 21
cmap = matplotlib.colormaps.get_cmap("tab20b")
prec_values = LAPD_rd["PREC"].unique()
prec_colors = {
    prec: cmap(i / num_precs) for i, prec in enumerate(sorted(prec_values))
}

# Convert RGBA to Hex for folium compatibility.
prec_colors_hex = {
    prec: f'#{int(r*255):02x}{int(g*255):02x}{int(b*255):02x}' for prec, (r, g, b, _) in prec_colors.items()
}

# Define the style function.
def style_function(feature):
    prec = feature["properties"]["PREC"]
    color = prec_colors_hex.get(prec, "#808080")
    return {
        "fillColor": color,
        "color": "black",
        "weight": 1,
        "fillOpacity": 0.6,
    }

For each time period, a hotspot map is returned. The red transparent polygons represent the convex hulls of the violent crimes clusters, and the blue polygons represent the convex hulls of the property crimes clusters. The centroids of violent and property crimes clusters are represented by red and green dots respectively. Noises for violent and property crimes are represented by grey and blue dots respectively.

Code
def hotspot_map(year, month):
    temp = LAcrime[(LAcrime["Month"] == month) & (LAcrime["Year"] == year)]
    
    # Initialize a map
    m = folium.Map(location = [34.0522, -118.2437], zoom_start = 9)
    
    # Add LAPD reporting districts.
    folium.GeoJson(
        LAPD_rd,
        style_function = style_function,
        tooltip = folium.GeoJsonTooltip(fields = ["PREC"], aliases = ["Precinct:"], localize = True),
        name = "LAPD Reporting Districts"
    ).add_to(m)
    
    # Add cluster polygons for each crime type.
    for idx, row in LAcrime_clustered[(LAcrime_clustered["Year"] == year) & (LAcrime_clustered["Month"] == month)].iterrows():
        # Access the tuple components explicitly.
        crime_type, label = row["Crime Type Label"]

        # Assign colors based on the crime type.
        color = "red" if crime_type == "violent" else "blue"

        # Create a descriptive name for each cluster layer.
        layer_name = f"{crime_type.capitalize()} Cluster {label}"

        # Add the geometry to the Folium map.
        folium.GeoJson(
            row["geometry"],
            style_function = lambda feature, color = color: {
                "fillColor": color,
                "color": color,
                "weight": 1,
                "fillOpacity": 0.25,
            },
            tooltip = f"Crime Type: {crime_type}, Cluster: {label}",
            name = layer_name
        ).add_to(m)
        
    # Step 4: Plot noise points and centroids for clusters.
    for crime_type, color, marker_color in [("violent", "grey", "red"), ("property", "blue", "green")]:
        crime_data = temp[temp["Crime Type"] == crime_type]

        # Plot noise points.
        noise_points = crime_data[crime_data["label"] == -1]
        for _, row in noise_points.iterrows():
            folium.CircleMarker(
                [row["LAT"], row["LON"]],
                radius = 2,
                color = color,
                fill = True,
                fill_color = color,
                tooltip = f'Noise Point: {crime_type}',
                name = f"{crime_type.capitalize()} Noise Points"
            ).add_to(m)

        # Plot centroids for each cluster.
        for label in crime_data["label"].unique():
            if label == -1:
                continue
            cluster_data = crime_data[crime_data["label"] == label]
            centroid_x = cluster_data["LON"].mean()
            centroid_y = cluster_data["LAT"].mean()
            folium.CircleMarker(
                [centroid_y, centroid_x],
                radius = 2,
                color = marker_color,
                fill = True,
                fill_color = marker_color,
                tooltip = f"{crime_type} Cluster {label} Centroid",
                name = f"{crime_type.capitalize()} Cluster {label} Centroid"
            ).add_to(m)
        
    # Add Layer Control to toggle layers.
    folium.LayerControl().add_to(m)
    
    # Return the map.
    return pn.panel(m, height = 500, width = 800)

As a quick example, let’s look at the hotspot maps for January 2018, 2019, and 2020.

hotspot_map(2018, 1)
hotspot_map(2019, 1)
hotspot_map(2020, 1)

We observe that there are several persistent violent crime hotspots over the years: - One at the Northwest, covering precincts 9, 10, 15, 16, 17, 19, 21. - One in the middle, covering precincts 1, 2, 3, 4, 6, 7, 11, 12, 13. - Two smaller hotspots at the South, covering precinct 5. In particular, the violent crime hotspot in the middle is consistently large, and the two violent crime hotspots in precinct 5 are consistently small. On the other hand, The violent crime hotspot at the Northwest is more volatile in terms of size; in some periods it may breakdown into several smaller pieces.

Property crime hotspots, on the other hand, are more volatile. While we can identify some roughly persistent areas, the sizes of these hotspots vary time to time. But roughly speaking, there is a major property crime hotspot spanning the entire middle precincts, and sometimes even encompasses the Northwest precincts as well. Another one can be identified at the South.

These observations are general throughout the period from 2018 onwards. However, it warrants more caution when it comes to the hotspot maps for 2024, due to potential data incompleteness.

Remark:

Same as the crime distribution maps, below is an attempt to create a panel dashboard with year and month sliders for more interactive display. While it can be executed normally in Jupyter notebook, it needs to be hosted on dynamic servers. Again, this is part of the future work.

Code
# Create the sliders for year and month
year_selector = pn.widgets.IntSlider(name = "Year", start = 2010, end = 2024, value = 2018)
month_selector = pn.widgets.IntSlider(name = "Month", start = 1, end = 12, value = 1)

# Bind the function to the widgets
interactive_hotspot_map = pn.bind(hotspot_map, year = year_selector, month = month_selector)

# Layout the widgets and map together
pn.Column(year_selector, month_selector, interactive_hotspot_map)